Skip to content

Conversation

sayantn
Copy link
Contributor

@sayantn sayantn commented May 7, 2025

This PR changes how LLVM intrinsics are codegen

Explanation of the changes

Current procedure

This is the same for all functions, LLVM intrinsics are not treated specially

  • We get the LLVM Type of a function simply using the argument types. For example, the following function
    #[link_name = "llvm.sqrt.f32"]
    fn sqrtf32(a: f32) -> f32;
    will have LLVM type simply f32 (f32) due to the Rust signature

Pros

  • Simpler to implement, no extra complexity involved due to LLVM intrinsics

Cons

  • LLVM intrinsics have a well-defined signature, completely defined by their name (and if it is overloaded, the type parameters). So, this process of converting Rust signatures to LLVM signatures may not work, for example the following code generates LLVM IR without any problem
    #[link_name = "llvm.sqrt.f32"]
    fn sqrtf32(a: i32) -> f32;
    but the generated LLVM IR is invalid, because it has wrong signature for the intrinsic (Godbolt, adding -Zverify-llvm-ir to it will fail compilation). I would expect this code to not compile at all instead of generating invalid IR.
  • LLVM intrinsics that have types in their signature that can't be accessed from Rust (notable examples are the AMX intrinsics that have the x86amx type, and (almost) all intrinsics that have vectors of i1 types) can't be linked to at all. This is a (major?) roadblock in the AMX and AVX512 support in stdarch.
  • If code uses an non-existing LLVM intrinsic, even -Zverify-llvm-ir won't complain. Eventually it will error out due to the non-existing function (courtesy of the linker). I don't think this is a behavior we want.

What this PR does

  • When linking to non-overloaded intrinsics, we use the function LLVMIntrinsicGetType to directly get the function type of the intrinsic from LLVM.
  • We then use this LLVM definition to verify the Rust signature, and emit a proper error if it doesn't match, instead of silently emitting invalid IR.
  • Lint if linking to deprecated or invalid LLVM intrinsics

Note

This PR only focuses on non-overloaded intrinsics, overloaded can be done in a future PR

Regardless, the undermentioned functionalities work for all intrinsics

  • If we can't find the intrinsic, we check if it has been AutoUpgraded by LLVM. If not, that means it is an invalid intrinsic, and we error out.
  • Don't allow intrinsics from other archs to be declared, e.g. error out if an AArch64 intrinsic is declared when we are compiling for x86

Pros

  • It is now not possible (or at least, it would require significantly more leaps and bounds) to introduce invalid IR using non-overloaded LLVM intrinsics.
  • As we are now doing the matching of Rust signatures to LLVM intrinsics ourselves, we can now add bypasses to enable linking to such non-Rust types (e.g. matching 8192-bit vectors to x86amx and injecting llvm.x86.cast.vector.to.tile and llvm.x86.cast.tile.to.vectors in callsite)

Note

I don't intend for these bypasses to be permanent. A better approach will be introducing a bf16 type in Rust, and allowing repr(simd) with bools to get Rust-native i1xNs. These are meant to be short-time, as I mentioned, "bypass"es. They shouldn't cause any major breakage even if removed, as link_llvm_intrinsics is perma-unstable.

This PR adds bypasses for bf16 (via i16), bf16xN (via i16xN) and i1xN (via iM, where M is the smallest power of 2 s.t. M >= N, unless N <= 4, where we use M = 8). This will unblock AVX512-VP2INTERSECT and a lot of bf16 intrinsics in stdarch. This PR also automatically destructures structs if the types don't exactly match (this is required for us to start emitting hard errors on mismmatches).

Cons

  • This only works for non-overloaded intrinsics (at least for now). Improving this to work with overloaded intrinsics too will involve significantly more work.

Possible ways to extend this to overloaded intrinsics (future)

Parse the mangled intrinsic name to get the type parameters

LLVM has a stable mangling of intrinsic names with type parameters (in LLVMIntrinsicCopyOverloadedName2), so we can parse the name to get the type parameters, and then just do the same thing.

Pros

  • For most intrinsics, this will work perfectly, and is a easy way to do this.

Cons

  • The LLVM mangling is not perfectly reversible. When we have TargetExt types or identified structs, their name is a part of the mangling, making it impossible to reverse. Even more complexities arise when there are unnamed identified structs, as LLVM adds more mangling to the names.
  • @nikic's work on LLVM intrinsics will remove the name mangling, making this approach impossible

Use the IITDescriptor table and the Rust function signature

We can use the base name to get the IITDescriptors of the corresponding intrinsic, and then manually implement the matching logic based on the Rust signature.

Pros

  • Doesn't have the above mentioned limitation of the parsing approach, has correct behavior even when there are identified structs and TargetExt types. Also, fun fact, Rust exports all struct types as literal structs (unless it is emitting LLVM IR, then it always uses named identified structs, with mangled names)

Cons

  • Doesn't actually use the type parameters in the name, only uses the base name and the Rust signature to get the llvm signature (although we can check that it is the correct name). It means there would be no way to (for example) link against llvm.sqrt.bf16 until we have bf16 types in Rust. Because if we are using u16s (or any other type) as bf16s, then the matcher will deduce that the signature is u16 (u16) not bf16 (bf16) (which would lead to an error because u16 is not a valid type parameter for llvm.sqrt), even though the intended type parameter is specified in the name.
  • Much more complex, and hard to maintain as LLVM gets new IITDescriptorKinds

These 2 approaches might give different results for same function. Let's take

#[link_name = "llvm.is.constant.bf16"]
fn foo(a: u16) -> bool

The name-based approach will decide that the type parameter is bf16, and the LLVM signature is i1 (bf16) and will inject some bitcasts at callsite.
The IITDescriptor-based approach will decide that the LLVM signature is i1 (u16), and will see that the name given doesn't match the expected name (llvm.is.constant.u16), and will error out.

Other things that this PR does

  • Removes unnecessary bitcasts in cg_llvm/builder::check_call (now renamed as cast_arguments due to its new counterpart cast_return). This was old code from when Rust used to pass non-erased lifetimes to LLVM.

Reviews are welcome, as this is my first time actually contributing to rustc

After CI is green, we would need a try build and a rustc-perf run.

@rustbot label T-compiler A-codegen A-LLVM
r? codegen

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. O-x86_64 Target: x86-64 processors (like x86_64-*) (also known as amd64 and x64) labels May 7, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 8, 2025

Some changes occurred in compiler/rustc_codegen_ssa

cc @WaffleLapkin

@sayantn

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rustbot
Copy link
Collaborator

rustbot commented May 8, 2025

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

@rust-log-analyzer

This comment has been minimized.

@sayantn sayantn changed the title Add auto-bitcasts from/to x86amx and i32x256 for AMX intrinsics Add auto-bitcasts from/to x86amx for i32x256 for AMX intrinsics May 8, 2025
@sayantn sayantn changed the title Add auto-bitcasts from/to x86amx for i32x256 for AMX intrinsics Add auto-bitcasts between x86amx and i32x256 for AMX intrinsics May 8, 2025
@sayantn

This comment has been minimized.

@dianqk
Copy link
Member

dianqk commented May 9, 2025

I think you can use LLVMGetIntrinsicDeclaration, LLVMGetIntrinsicDeclaration or some functions in Intrinsic.h in declare_raw_fn, as a reference: https://github.com/llvm/llvm-project/blob/d35ad58859c97521edab7b2eddfa9fe6838b9a5e/llvm/lib/AsmParser/LLParser.cpp#L330-L335.

@sayantn
Copy link
Contributor Author

sayantn commented May 9, 2025

That can be used to improve performance, I am not really focusing on performance in this PR. I want to currently emphasize the correctness of the codegen.

@sayantn
Copy link
Contributor Author

sayantn commented May 9, 2025

Oh wait, I probably misunderstood your comment, you meant using the llvm declaration by itself. Yeah, that would be better, thanks for the info. I will update the impl when I get the chance

@dianqk
Copy link
Member

dianqk commented May 15, 2025

Oh wait, I probably misunderstood your comment, you meant using the llvm declaration by itself. Yeah, that would be better, thanks for the info. I will update the impl when I get the chance

I think you can just focus on non-overloaded functions for this PR. Overloaded functions and type checking that checking Rust function signatures using LLVM defined can be subsequent PRs.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels May 15, 2025
@rustbot
Copy link
Collaborator

rustbot commented May 15, 2025

Reminder, once the PR becomes ready for a review, use @rustbot ready.

@sayantn

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@sayantn sayantn marked this pull request as draft May 19, 2025 07:23
@nikic
Copy link
Contributor

nikic commented May 19, 2025

@sayantn Taking the address of an intrinsic is invalid LLVM IR.

@sayantn
Copy link
Contributor Author

sayantn commented Jun 25, 2025

Do you have an example of an intrinsic currently using struct return?

There are actually quite a few of them in stdarch. One example otoh will be llvm.x86.encodekey128, used in the x86 intrinsic _mm_encodekey128_u8. The return type is {i8,i64x2,i64x2,i64x2,i64x2,i64x2,i64x2}, and due to alignment mismatch of i8 and i64x2, this cannot be a Rust struct type without #[repr(packed)]

Mostly the same, with the caveat that intrinsic have to be constructed from intrinsic ID and either the function type or type overloads. From a string name only via auto-upgrade. It's possible to verify whether a function type is valid for an intrinsic ID, but it's no longer associated with a unique name.

Will the intrinsic names still be mangled in IR with the overload types? Or is it a more fundamental change so that LLVM will automatically pull out the correct overloading when it sees an intrinsic call?

@nikic
Copy link
Contributor

nikic commented Jun 25, 2025

Mostly the same, with the caveat that intrinsic have to be constructed from intrinsic ID and either the function type or type overloads. From a string name only via auto-upgrade. It's possible to verify whether a function type is valid for an intrinsic ID, but it's no longer associated with a unique name.

Will the intrinsic names still be mangled in IR with the overload types? Or is it a more fundamental change so that LLVM will automatically pull out the correct overloading when it sees an intrinsic call?

The intrinsic names will not be mangled in IR. It will be printed using the intrinsic base name without the mangling suffix (internally it's just an entirely unnamed function, only identified by the intrinsic ID).

@sayantn
Copy link
Contributor Author

sayantn commented Jun 25, 2025

Ah, so all the overloads will be printed with just the base name in IR. This certainly complicates my work, and makes it a lot harder to add autocasts for overloaded intrinsics without going through the IITDesc table, which is a lot of effort.

antoyo pushed a commit to rust-lang/rustc_codegen_gcc that referenced this pull request Jun 28, 2025
Simplify implementation of Rust intrinsics by using type parameters in the cache

The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the **base name and the type parameters** in the cache instead of the full, overloaded name. This has the benefit that `call_intrinsic` doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses `LLVMIntrinsicCopyOverloadedName2` to get the overloaded name from the base name and the type parameters, and only uses it to declare the function.

(originally was part of rust-lang/rust#140763, split off later)

`@rustbot` label A-codegen A-LLVM
r? codegen
github-actions bot pushed a commit to rust-lang/compiler-builtins that referenced this pull request Jul 12, 2025
Simplify implementation of Rust intrinsics by using type parameters in the cache

The current implementation of intrinsics have a lot of duplication to handle different overloads of overloaded LLVM intrinsic. This PR uses the **base name and the type parameters** in the cache instead of the full, overloaded name. This has the benefit that `call_intrinsic` doesn't need to provide the full name, rather the type parameters (which is most of the time more available). This uses `LLVMIntrinsicCopyOverloadedName2` to get the overloaded name from the base name and the type parameters, and only uses it to declare the function.

(originally was part of rust-lang/rust#140763, split off later)

`@rustbot` label A-codegen A-LLVM
r? codegen
@bors
Copy link
Collaborator

bors commented Jul 24, 2025

☔ The latest upstream changes (presumably #144062) made this pull request unmergeable. Please resolve the merge conflicts.

@rustbot rustbot added the F-autodiff `#![feature(autodiff)]` label Aug 28, 2025
@rustbot
Copy link
Collaborator

rustbot commented Aug 28, 2025

Some changes occurred in compiler/rustc_codegen_llvm/src/llvm/enzyme_ffi.rs

cc @ZuseZ4

@rustbot
Copy link
Collaborator

rustbot commented Aug 28, 2025

This PR was rebased onto a different master commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@rust-log-analyzer

This comment has been minimized.

@sayantn
Copy link
Contributor Author

sayantn commented Aug 28, 2025

@rustbot ready

Sorry for the inactivity, got busy with other things. I have removed the AMX autocasting part, but couldn't remove the bf16(xN) and i1xN autocasts due to complicated interactions with AutoUpgrade.

  • Some intrinsics that used iN before now use i1xN (e.g. llvm.x86.avx512.mask.cmp intrinsics)
  • Some intrinsics that used u16xN before now use i16xN

So the autoupgrade script upgrades the old versions to the new ones. But I encountered some interesting error messages if I try to delegate it to AutoUpgrade (while compiling stdarch)

rust-lld: error: undefined symbol: llvm.x86.avx512.mask.cmp.ps.512.old
>>> referenced by avx512f.rs:30232 (crates/core_arch/src/x86/avx512f.rs:30232)
>>> /home/schak/Documents/Rust/stdarch/target/debug/deps/core_arch-f4dbab0d28f30c86.48fby2y2clumk4xc11tjdgosq.04ns6r1.rcgu.o:(core_arch::core_arch::x86::avx512f::_mm512_cmp_ps_mask::h3001578843c2f684)
>>> referenced by avx512f.rs:30232 (crates/core_arch/src/x86/avx512f.rs:30232)
>>> /home/schak/Documents/Rust/stdarch/target/debug/deps/core_arch-f4dbab0d28f30c86.48fby2y2clumk4xc11tjdgosq.04ns6r1.rcgu.o:(core_arch::core_arch::x86::avx512f::_mm512_cmp_ps_mask::h3b8086b16256c396)
>>> referenced by avx512f.rs:30232 (crates/core_arch/src/x86/avx512f.rs:30232)
>>> /home/schak/Documents/Rust/stdarch/target/debug/deps/core_arch-f4dbab0d28f30c86.48fby2y2clumk4xc11tjdgosq.04ns6r1.rcgu.o:(core_arch::core_arch::x86::avx512f::_mm512_cmp_ps_mask::h3cf3b9fe5905466d)
>>> referenced 17 more times

@rust-log-analyzer

This comment has been minimized.

@sayantn sayantn changed the title Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for bf16(xN), i1xN and x86amx Change codegen of LLVM intrinsics to be name-based, and add llvm linkage support for bf16(xN) and i1xN Sep 2, 2025
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer
Copy link
Collaborator

The job aarch64-gnu-llvm-19-2 failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[TIMING:end] tool::LintDocs { compiler: Compiler { stage: 0, host: aarch64-unknown-linux-gnu, forced_compiler: false }, target: aarch64-unknown-linux-gnu } -- 0.000
##[group]Running stage2 lint-docs (stage1 -> stage2, aarch64-unknown-linux-gnu)
warning: the code example in lint `unknown_llvm_intrinsic` in /checkout/compiler/rustc_lint_defs/src/builtin.rs failed to generate the expected output: did not find lint `unknown_llvm_intrinsic` in output of example, got:

error: expected one of `!` or `[`, found keyword `fn`
##[error]  --> lint_example.rs:13:1
   |
12 | #
   |  - expected one of `!` or `[`
13 | fn main() {
---
Generating lint docs (aarch64-unknown-linux-gnu)
##[group]Running stage2 lint-docs (stage1 -> stage2, aarch64-unknown-linux-gnu)
error: failed to test example in lint docs for `unknown_llvm_intrinsic` in /checkout/compiler/rustc_lint_defs/src/builtin.rs:5147: did not find lint `unknown_llvm_intrinsic` in output of example, got:

error: expected one of `!` or `[`, found keyword `fn`
##[error]  --> lint_example.rs:13:1
   |
12 | #
   |  - expected one of `!` or `[`
13 | fn main() {
---



This error was generated by the lint-docs tool.
This tool extracts documentation for lints from the source code and places
them in the rustc book. See the declare_lint! documentation
https://doc.rust-lang.org/nightly/nightly-rustc/rustc_lint_defs/macro.declare_lint.html
for an example of the format of documentation this tool expects.

To re-run these tests, run: ./x.py test --keep-stage=0 src/tools/lint-docs
The --keep-stage flag should be used if you have already built the compiler
and are only modifying the doc comments to avoid rebuilding the compiler.

Command `/checkout/obj/build/aarch64-unknown-linux-gnu/stage1-tools-bin/lint-docs --build-rustc-stage 1 --src /checkout/compiler --out /checkout/obj/build/aarch64-unknown-linux-gnu/md-doc/rustc/src/lints --rustc /checkout/obj/build/aarch64-unknown-linux-gnu/stage1/bin/rustc --rustc-target aarch64-unknown-linux-gnu --validate` failed with exit code 1
Created at: src/bootstrap/src/core/build_steps/tool.rs:1548:23
Executed at: src/bootstrap/src/core/build_steps/doc.rs:1325:13

Command has failed. Rerun with -v to see more details.
Bootstrap failed while executing `--stage 2 test --skip tests --skip coverage-map --skip coverage-run --skip library --skip tidyselftest`
Build completed unsuccessfully in 0:40:31
  local time: Tue Sep  2 08:31:56 UTC 2025
  network time: Tue, 02 Sep 2025 08:31:56 GMT
##[error]Process completed with exit code 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-run-make Area: port run-make Makefiles to rmake.rs F-autodiff `#![feature(autodiff)]` S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants